Using feature structures as a unifying representation format for corpora exploration
نویسندگان
چکیده
In this paper we report on the use of feature structures to represent the linguistic information of a corpus. This approach has been adopted in TyPTex, a project which aims at providing a generic architecture for corpora profiling. After a brief overview of the Typtex project, we show that corpora exploration requires manipulating linguistic features in order to obtain a required level of linguistic information or changing the set of features to get a new point of view on the data. We show that feature structures formalism can help the building and management of linguistic features with Meta-Rules based on unification. Finally, we provide an example of marking which uses a mixed approach between projection of information from a static lexicon and contextual marking via Meta-Rules. Results tend to show that the use of feature structures can improve the coverage and reliability of the marking.
منابع مشابه
SusTEInability of linguistic resources through feature structures
This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation co...
متن کاملOPTIMIZATION OF AN OFFSHORE JACKET-TYPE STRUCTURE USING META-HEURISTIC ALGORITHMS
Offshore jacket-type towers are steel structures designed and constructed in marine environments for various purposes such as oil exploration and exploitation units, oceanographic research, and undersea testing. In this paper a newly developed meta-heuristic algorithm, namely Cyclical Parthenogenesis Algorithm (CPA), is utilized for sizing optimization of a jacket-type offshore structure. The a...
متن کاملAn XML-based Representation Format for Syntactically Annotated Corpora
This paper discusses a general approach to the description and encoding of linguistic corpora annotated with hierarchically structured syntactic information. A general format can be motivated by the variety and incompatibility of existing annotation formats. By using XML as a representation format the theoretical and technical problems encountered can be overcome.
متن کاملA New Approach towards Precise Planar Feature Characterization Using Image Analysis of FMI Image: Case Study of Gachsaran Oil Field Well No. 245, South West of Iran
Formation micro imager (FMI) can directly reflect changes of wall stratums and rock structures. Conventionally, FMI images mainly are analyzed with manual processing, which is extremely inefficient and incurs a heavy workload for experts. Iranian reservoirs are mainly carbonate reservoirs, in which the fractures have an important effect on permeability and petroleum production. In this paper, a...
متن کاملTowards A Modular Data Model For Multi-Layer Annotated Corpora
In this paper we discuss the current methods in the representation of corpora annotated at multiple levels of linguistic organization (so-called multi-level or multi-layer corpora). Taking five approaches which are representative of the current practice in this area, we discuss the commonalities and differences between them focusing on the underlying data models. The goal of the paper is to ide...
متن کامل